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•  Motivation:  Coping  with  Information  Overload 

•  Examples  of  Context  and  Content 

•  Random  Attributed  Graphs 
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-Dyadic  Priors 
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Reduction 


•  Mature:  External  Metadata 


Pick  out  the 
good  stuff 


Filter  and  Select 


Boil  it  down 


Stream  Characterization 


>  language 

•  Emerging:  Metacontent  >  speaker 

>  topic 
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•  Associated  meta-data  is  interpreted  by  humans  as  context 

•  Humans  acquire  and  use  language  during  interaction  with  a 
complex  environment,  e.g.  Roy's  Speechome  project  at  MIT 

•  Call  centers  have  customer-profiles 

•  Voice  messages  have  to/from  telephone  numbers 

•  Enron  email  corpus  has  date,  time,  sender  and  recipients 

•  Switchboard  dialog  corpus  has  demographics:  age,  gender,.... 

•  Citeseer  scientific  articles  have  authors  and  citations 
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Communication  Events 
from  the  Enron  Corpus 


Date 

Time 

Sender 

Receiver 

Sender's  Rank 

Topic 

2001-01-02 

04:15:00 

steven. k 

jeff.d 

Vice  President 

(1)  California  Analysis 

2001-02-09 

13:49:09 

louise.k 

andy.z 

President 

(9)  Daily  Business 

2001-02-16 

21:06:00 

drew.f 

jeff.d 

Vice  President 

(5)  California  Enron 

2001-02-26 

22:30:00 

james.s 

john.l 

Vice  President 

(14)  Energy  Newsfeed 

2001-03-01 

07:54:00 

diana.s 

kate.s 

Trader 

(5)  California  Enron 

2001-04-06 

05:15:00 

mike.g 

john.l 

Manager 

(7)  Newsfeed  California 

2001-04-16 

06:12:00 

richard.s 

Steve  n.k 

Vice  President 

(9)  Daily  Business 

2001-05-11 

16:02:00 

andy.z 

john.l 

Vice  President 

(11)  Enron  Online 

2001-06-27 

17:44:24 

S..S 

geoff.s 

Vice  President 

(9)  Daily  Business 

2001-09-05 

14:36:53 

geoff.s 

louise.k 

Director 

(12)  Enrononline  Daily 

2001-09-15 

20:51:20 

m..p 

louise.k 

Vice  President 

(12)  Enrononline  Daily 

2001-10-04 

14:19:16 

john.l 

louise.k 

CEO 

(11)  Enron  Online 

2001-10-05 

18:49:05 

j..k 

richard.s 

Vice  President 

(9)  Daily  Business 

2001-10-08 

17:50:19 

shelley.c 

darrell.s 

Vice  President 

(1)  California  Analysis 
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SwitchBoard  Communications  Graph 


Vertex  ~  speaker 
Edge  ~  dialog 


only 
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Time  Series  of  Attributed  Graphs 
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Random  Attributed  Graphs  (RAGs) 
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Random  Attributed  Graphs  (RAGs) 


•  There  is  significant  literature  on  random  graphs,  ignoring  content . 

•  There  is  significant  literature  on  stochastic  models  for  language  and 
documents  streams,  ignoring  context. 

•  There  is  a  computer  science  literature  on  attributed  graphs,  e.g.  as 
produced  by  entity  and  relations,  ignoring  stochastic  modeling. 

•  Before  this  research  effort,  no  literature  that  we  know  of  addressing 
time  series  of  random  attributed  graphs. 
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Generative  Models  for  RAGs 


•  Build  RAG  models  by  extending  random  graph  models 

•  Erdos-Renyi  (binomial)  graphs,  where  a  pair  of  vertices  is  connected 
with  iid  probability  p. 

•  Kidney/Egg  models,  Block  models 

•  Latent  Position  and  Random  Dot  Product  Models  where 

Pij  =  h(xj ,  Xj) 

•  Construct  from  time  series  of  communication  events 

M  =  { (t,  ut ,  vt ,  st)  }t 
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Vertex  Nomination 


•  Cf.  fraud  and  social  network  analysis 

-  significant  literature  using  graphs 

•  Intuition  for  fusion  is  clear 

•  Experimental  evaluation  on  Enron  email  corpus 

•  Summer  workshop 

-  at  JHU  Human  Language  Technology  COE 

-  participants  from  all  over  the  U.S. 
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Experimental  Methodology 


•  Given  a  set  of  red  vertices 

•  Occlude  subset  of  red  vertices 

•  Develop  method  for  nominating  vertices  as  red 

•  Evaluate  on  how  well  it  discovers  those  occluded  red  vertices 

—  versus  false  nominations 
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Enron  Example:  Red  Vertices  ->  Red  Documents 
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Edge  Attributed  Graph  ->  Latent  Vertex  Attributes 
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Latent  Vertex  Attributes  live  in  the  2D  simplex 


xO  ~  mum-ness 
xl  ~  redness 
x2  ~  greenness 


8/8/201 1 


Graph  Ex 


15 


Distribution  of  1 84  Latent  Vertex  Attributes 


•  denote  initial  red  vertices 


X, 
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Distribution  of  1 84  Latent  Vertex  Attributes 


Sparse  Communication  Graph  •  denote  initial  red  vertices 


X, 
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Anomalous  Chatter  Group  in  Enron  Time  Series 


Induced  Egg 


Egg?  p>0.99 

Time  Weeks  18-37 


p~  0.7 

Weeks  38-57 


p  <  0.01 

Weeks  58-77 
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Conclusions 


•  New  Methods  for  Fusion  of  Context  and  Content 

•  Pioneered  at  JHU  Human  Language  Technology  COE 

•  Theory,  Algorithms  and  Experimental  Evaluation 

•  Tasks 

—  Stream  Characterization 

—  Vertex  Nomination 

—  Dyadic  Priors 

•  Experimentally  evaluated  on 

—  Enron  email  corpus 

—  Switchboard  speech  corpus 

—  other  data 


8/8/20 1  I 


Graph  Ex 


18 


Some  References 


•  Statistical  Inference  on  Random  Graphs:  Fusion  of  Graph  Features  and 
Content,  Grothendieck,  Priebe,  and  Gorin,  Computational  Statistics  and  Data 
Analysis  (2010) 

•  Statistical  Inference  on  random  attributed  Graphs:  Fusion  of  Graph  Features 
and  Content:  An  Experiment  on  Time-series  of  Enron  Graphs,  Priebe  et  al, 
Computational  Statistics  and  Data  Analysis  (2010). 

•  Towards  Link  Characterization  from  Content:  Recovering  Distributions  from 
Classifier  Output,  Grothendieck  and  Gorin  ,  IEEE  Transactions  on  Speech  and 
Audio,  May  2008 

•  Vertex  Nomination  via  Content  and  Context,  Coppersmith  and  Priebe 
submitted  for  publication 

•  Vertex  Nomination  via  Attributed  Random  Dot  Product  Graphs,  Marchette, 
Priebe,  Coppersmith ,  Proc.  International  Statistical  Institute,  2011. 

•  Latent  Process  Model  for  Time  Series  of  Attributed  Random  Graphs,  Lee  and 

Priebe,  Statistical  Inference  for  Stochastic  Processes,  2011 


8/8/20 1  I 


Graph  Ex 


19 


